Identification of railway subgrade defects based on ground penetrating radar

A recognition method is proposed to solve the problems in subgrade detection with ground penetrating radar, such as massive data, time–frequency and difference in experience. According to the sparsity of subgrade defects in radar images, the sparse representation of railway subgrade defects is studied from the aspects of the time domain, and time–frequency domain with compressive sensing theory. The features of the radar signal are extracted by sparse representation, thus the sampling data are reduced. Based on fuzzy C-means and generalized regression neural network, a rapid recognition of the railway subgrade defects is realized. Experimental results show that the redundancy of data is reduced, and the accuracy of identification is greatly increased.

applied WT was to interpret GPR data to evaluate ballast fouling. Ciampoli et al. 3 used both time-frequency and discrete wavelet techniques to evaluate the levels of fouled ballast. The methods are limited to process a large number of original data and obtain redundant feature parameters. Therefore, time lags are a considerable drawback in these methods, which are contradictory to rapid evaluation the subgrade condition.
With the development of feature extraction methods, such as compressed sensing (CS), sparse representation, a new method is provided to extract features. Shao et al. 25,26 analyzed the relationship between the frequency and standard deviation, obtained the sparse feature vector of ballasted railways. Sun et al. 27 combined sparse scattering with geometrical features of landmines, detected the landmine rapidly. Based on previous studies, this method is a clear advantage in sparsity for the ballast layer or single structure, and obviously reduces the amount of data. Our study focuses on identifying the complex heterogeneous subgrade defects, analyzes the features of target echoes, and constructs a feature extraction to identify subgrade defects. "Methodology" Section analyzes the sparse characteristics of the spatial structure, introduces a methodology to identify subgrade defects. "Result analysis and discussion" Section describes a rapid identification of railway subgrade defects based on GPR images, and verifies the reliability of the proposed algorithm through field experiments. "Conclusion" Section summarizes our study and presents relevant conclusions.

Methodology
Target sparsity and sparse imaging. For radar signals of fixed frequency, the mixer output is a linear frequency modulation signal, and its frequency signal is as follows: where A is the amplitude of the signals, f 0 is the initial frequency, and k is the frequency modulation slope. The echo of a point with a distance of H is as follows: where ρ is the target location, S (H) is the attenuation factor,c is the electromagnetic wave velocity in a vacuum,i is the echo channel number, and σ is the target reflection coefficient.
To speed up signal reading and processing speed, every 25 channels of radar signals form a 256 × 256 pixel image φ µ x , µ y .The coincidence rate of radar images is 50%. Combined with the sparsity of railway subgrade defects, the relationship between the measurement target and space images is as follows: where π T x, y, z is the spatial position of the measurement target, d µ x , µ y , f is the frequency-space image, and ψ is the space transformation basis matrix, that is, the dictionary. Sparse matrix of radar signals. To establish the sparse matrix, the measurement target must be discrete in spatial position,i is the echo channel number, x i , y i , z i is the spatial position, image space B is formed correspondingly by N pixels {π 1 , π 2 , . . . , π n } , and each pixel π i corresponds to the three-dimensional vector x i , y i , z i . The ith vector radix of pixel π i is as follows: where ω is the frequency vector B, and the dictionary matrix corresponding to the echo channel is obtained through Formula (4). The P target echo is received by the ith echo channel Formula (6) is converted to a vector where b is the weighted steering vector of the target space, π j is the partial position of the measurement target, b j = Aσ j Ŵ π j ; otherwise, b j = 0.
Assume that the collected radar signals are made up of huge amounts of one-dimensional signals of length L with a sparsity of k (that is, it contains k nonzero values), which form a large quantity of sample data. Due to the sparsity of railway subgrade defects in space, this paper proposes the compressive sensing method for data sampling, that is, a small number of signals represent all signals, to construct the target images. The measurement matrix should be a random matrix that is not related to the dictionary. In this paper, the Bernhard matrix composed of 0 and 1 elements is selected as the observation matrix. Therefore, M random rows are extracted from the L × L identity matrix to obtain the measurement matrix ϕ i corresponding to the ith echo channel. The measured value is The measurement matrix ϕ i of each echo channel is different. www.nature.com/scientificreports/ To obtain the space guidance vector b, echo channels K are selected. The dictionary matrix is ψ = ψ T 1 , ψ T 2 . . . , ψ T K , the measurement matrix is ϕ = diag[ϕ, ϕ 2 . . . , ϕ K ] , the measurement value is β = β T 1 , β T 2 · ··, β T K , and reconstruction b becomes a constrained problem to solve the convex optimization problem.
where Formula (8) is only in the absence of noise. The measured value corresponding to the ith echo channel with noise: where µ i = ϕ i n i ∼ N 0, σ 2 and n i ∼ N 0, σ 2 is aliasing noise. Convex optimization of the improved L1 norm under constraint conditions: where A = ϕψ , ε = σ 2 lg N . Formula (10) can be used to create the target images.
Feature representation of railway subgrade defects. The subgrade defects are sparse from the graphical distribution based on the CS (compressed sensing) algorithm. The identification parameters of typical defects are as follows: (1) Peaks of the multiscale wavelet energy spectrum of the subgrade; (2) The time-domain features, such as energy per block, the variation per block, the variation and the demixing points per block.

Feature extraction of GPR signals based on the time domain.
Based on the continuity and disorder of the phase axes of the subgrade, the time domain characteristics of the subgrade defects are established. A signal of length N is divided into M blocks, and each block image is divided into K segments by length. The coincidence rate between the images is 50%. The features of subgrade are as followed: where i = 0, 1, 2, · · · K − 1 ; E i is the energy of the ith segment; A j is the amplitude of the jth sample; σ 2 is the sample variance of the ith segment; A i is the mean amplitude of the ith segment.
Horizontal energy spectrum. The characteristics of the subgrade radar signal are different at each scale. Each scale energy has different contributions to the total energy. The main part of the signal is identified according to the characteristics of the energy spectrum. The component energy of wavelet decomposition at the Jth scale is shown as follows: where A J f(n) is the low-frequency reconstructed signal at the Jth wavelet decomposition, and D J f(n) is the high-frequency reconstructed signal at the Jth wavelet decomposition.E A J f (n) and E D J f (n) are the low-and highfrequency signal energies, respectively, at the Jth wavelet decomposition.
Sparse matrix. Based on L1-norm optimization method, the training samples matrix are constructed by eigenvalues of subgrade defects and all used as the data dictionary of the sparse representation. The flow of the subgrade defects is shown in Fig. 1.

Target detection and identification method. Identification of subgrade defects based on FCM. The
FCM algorithm was as follows: (1) The subgrade defects are divided into three categories: sinkhole, mud pumping, and settlement, and the fuzzy weight index is determined;

Identification of subgrade defects based on FCM-GRNN.
According to the fuzzy boundaries and considerable data of railway subgrade defects, FCM and GRNN algorithms are combined to identify the subgrade defects, as shown in Fig. 2. The specific algorithm is as follows: (1) Based on the FCM, the GRNN is used to predict the type of training samples; (2) The corresponding mean value center (v) and the distance between the individual (subgrade defect type) in the class and the mean value center are recalculated, and the data closest to the center are selected as the training samples of the network; (3) After repeated calculations, the final network cluster is obtained.

Result analysis and discussion
Some sections of the Daqin and Shichang railways are detected by GPR installed at the bottom of rail inspection vehicle, as shown in Fig. 3a. To meet the requirements of maximum detection depth and depth resolution, 100 and 400 MHz radar antennas are adopted to detect the railway subgrade, as shown in Fig. 3b.

The GPR system
Pre-process Feature The change of the energy and the phase axis of is obvious in the subgrade defects, and the space location, energy, and variation of defects are different from those of the normal subgrade, as shown in Fig. 4. The energy per block and the variation per block can distinguish the normal subgrade from the defects. The phase axes of settlement apparently decline, the energy of the fault increases obviously, and the variation and demixing points per block are distinguished from the settlement. However, the interfaces of mud pumping become vague. In addition, the high conductivity of the mud pumping makes the energy of the radar image low. Judging from the energy and variance of the radar images, the subgrade defects can be identified.
(2) Horizontal energy spectrum The wavelet energy spectrum in scale 18 was built to reduce sample data combining wavelet multi-scale decomposition and power spectra analysis, as shown in Fig. 5. Through the energy spectrum, it can be seen that the characteristic peaks of the normal subgrade, sinkhole, and settlements are all in scale 8, and the characteristic peak of mud pumping is in scale 6. The energy spectrum of the normal subgrade is as high as 2.5 × 10 -4 J/ Hz, and the energy spectrum of mud pumping is as low as 160 J/Hz. The energy spectra between settlement and sinkholes are so similar that it is difficult to distinguish between them.
(3) Analysis of sparsity 100 blocks of subgrade defects are selected as the test samples, and grouped into three categories: sinkhole, mud pumping, and settlement, corresponding to the first, second and third category. The number of features is 73 extracted by time domain and energy spectrum. The different dimensional visualization of subgrade defects is shown in Fig. 6. All the extracted 32-dimensional eigenvalues are clustered, and used as feature vectors.
Take subgrade settlement for example, the sparsity in subgrade defects dictionary is analyzed, as shown in Fig. 7. Based on L1 minimum norm method, the sparse coefficient of settlement is calculated in dictionary settlement matrixA 1 , sinkhole matrixA 2 , and mud pumping matrix A 3 , respectively, and thus the dictionary matrix A is made up of the 32-dimensional eigenvalues. The settlement in dictionary A 1 is sparse, and most of coefficient is 0, as shown in Fig. 7c, but it is not sparse in dictionary A 2 or A 3 , as shown in Fig. 7a and b.
Compared with CS images, restored images, and original radar images, the feasibility and accuracy of sparse representation is shown in Fig. 8. The radar images of subgrade, including normal subgrade, settlement, sinkhole, mud pumping, are shown in Fig. 8a. The data sets obviously are declined. Restored images based on sparse  Fig. 8b, and partial data loss has little effect on CS imaging results. Compared with Fig. 8b and c, it is not difficult to find that the CS algorithm completes the target detection.
Identification of subgrade defects. Figure 9 shows that the faster convergency with FCM-GRNN algorithm on the basis of the clustering center (v) and individual fuzzy membership matrix (u) trained by FCM algorithm. The training error is converged gradually. Figure 10 shows the classification accuracy obtained by FCM and FCM-GRNN, respectively. Detailed information about the recognition rates is shown in the confusion matrices. The confusion matrices demonstrate that the recognition rates vary significantly (Fig. 10a), and the recognition rates vary weakly (Fig. 10b), thereby, the classification results are influenced by recognition methods. Thus, we can conclude that the FCM-GRNN exhibits higher classification accuracy, and efficient classification of subgrade defects is not implemented by FCM. The clustering center (v) and individual fuzzy membership matrix (u) are obtained by FCM, then v is recalculated and the new u is obtained by FCM-GRNN, thus clustering results are improved. The Daqin Railway subgrade is chosen as the target. A total of 1084 subgrade sinkholes, 970 mud pumping defects, and 1534 subgrade settlements are selected as the test samples. Table 1 lists that railway subgrade defects are effectively identified by the FCM and FCM-GRNN algorithms. The accuracy rate of FCM-GRNN algorithms reaches 100% both for settlement and mud pumping. The accuracy rate of subgrade sinkholes by FCM-GRNN is 59.1%, and the result of it is more accuracy than the result that gain by FCM.
To verify the method to identify the subgrade defects, Daqin railway is detected by GPR, as shown in Fig. 11. The normal subgrade is shown in Fig. 11a, and there are some defects, such as settlement (Fig. 11b), sinkhole (Fig. 11c) and mud pumping (Fig. 11d), in the railway sungrades. Figure 11b shows an obvious semi-parabolic on the edge of the stage and line feature at the bottom. Figure 11c shows the phase axis is lower than normal axis, and the range is relatively small. The defects are inferred to be the remains of the artificial mining cave. Under long-term traffic loading and water erosion conditions, the structure of rock and soil is gradually destroyed, and its load-carrying capacity gradually decreases, eventually leading to collapse. The collapse forms a loose area, which results in subgrade settlement, and water-enriched regions form mud pumping. Figure 11d is strongly reflected signal region, and the axis is not exit.

Conclusion
The railway subgrade defects present sparse in radar images, which meets the requirements of sparse theory. The demixing points, energy, and variance per block are obtained as time domain eigenvalues, and the energy spectrum of the wavelet multiscale spatial are acquired and made up of data dictionary. The optimal sparse radar feature is established based on L1 minimum norm method. Fuzzy C-means (FCM) and generalized regression neural network (GRNN) are used as the recognition algorithms for subgrade defects. FCM-GRNN simulation and field experiments show that the classification accuracy of sinkhole, mud pumping, and settlement is 100, 100, 59.1%, respectively.
This study combines sparse theory with field experiments and obtains sparse features to identify the subgrade defects. The identification method overcomes the influence of redundant data and promotes GPR application for the detection of railway subgrade defects. However, the classification accuracy of settlement is relatively low. Hence, the identification methods for settlement should be further discussed. (1)

Data availability
All data, models, and code generated or used during the study appear in the submitted article.

Code availability
Source code implementing the algorithm described in this work can be obtained from the following git repository: http:// www. ilove matlab. com/. This software is written in Matlab (version R2016a). It should run on any modern Linux x86 computer supporting the aforementioned packages.   www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.